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Abstract 

We present recent results aiming at assessing the coverage properties of Bayesian 
and frequentist inference methods, as applied to the reconstruction of super- 
symmetric parameters from simulated LHC data. We discuss the statistical 
challenges of the reconstruction procedure, and highlight the algorithmic dif- 
ficulties of obtaining accurate profile likelihood estimates. 

1 Introduction 

(N ■ 

Experiments at the Large Hadron Collider (LHC) have already started testing many models of particle 
physics beyond the Standard Model (SM), and particular attention is being paid to the Minimal Super- 
symmetric SM (MSSM) and to other scenarios involving softly-broken supersymmetry (SUSY). 



In the last few years, parameter inference methodologies have been developed, applying both Fre- 
£N) ■ quentist and Bayesian statistics (see e.g., IBS). While the efficiency of Markov Chain Monte Carlo 

(MCMC) techniques has allowed for a full exploration of multidimensional models, the likelihood func- 
tion from present data is multimodal with many narrow features, making the exploration task with con- 
ventional MCMC methods challenging. A powerful alternative to classical MCMC has emerged in the 
form of Nested Sampling Q, a Monte Carlo method whose primary aim is the efficient calculation 
of the Bayesian evidence, or model likelihood. As a by-product, the algorithm also produces samples 
from the posterior distribution. Those same samples can also be used to estimate the profile likelihood. 
MultiNest (H, a publicly available implementation of the nested sampling algorithm, has been shown 
| to reduce the computational cost of performing Bayesian analysis typically by two orders of magnitude 

as compared with basic MCMC techniques. MultiNest has been integrated in the SuperBayeS code 1 
for fast and efficient exploration of SUSY models. 

Having implemented sophisticated statistical and scanning methods, several groups have turned 
their attention to evaluating the sensitivity to the choice of priors |@]|9l[10j and of scanning algorithms 
iflTTl . Those analyses indicate that current constraints are not strong enough to dominate the Bayesian 
posterior and that the choice of prior does influence the resulting inference. While confidence intervals 
derived from the profile likelihood or a chi-square have no formal dependence on a prior, there is a sam- 
pling artifact when the contours are extracted from samples produced from Bayesian sampling schemes, 
such as MCMC or MultiNest [101. 



X 



Given the sensitivity to priors and the differences between the intervals obtained from different 
methods, it is natural to seek out a quantitative assessment of their performance, namely their coverage: 
the probability that an interval will contain (cover) the true value of a parameter. The defining property 
of a 95% confidence interval is that the procedure adopted for its estimation should produce intervals that 
cover the true value 95% of the time; thus, it is reasonable to check if the procedures have the properties 
they claim. While Bayesian techniques are not designed with coverage as a goal, it is still meaningful 
to investigate their coverage properties. Moreover, the frequentist intervals obtained from the profile 
likelihood or chi-square functions are based on asymptotic approximations and are not guaranteed to 
have the claimed coverage properties. 

Here we report on recent studies investigating the coverage properties of both Bayesian and Fre- 
quentist procedures commonly used in the literature. We also highlight the numerical and sampling 
challenges that have to be met in order to obtain a sufficienlty high-resolution mapping of the profile 
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likelihood when adopting Bayesian algorithms (which are typically designed to map out the posterior 
mass, instead). 

For the sake of example, we consider in the following the so-called mSUGRA or Constrained Min- 
imal Supersymmetric Standard Model (CMSSM), a model with fairly strong universality assumptions 
regarding the SUSY breaking parameters, which reduce the number of free parameters to be estimated 
to just five, denoted by the symbol 0: common scalar (mo), gaugino (mi / 2 ) and tri-linear (Aq) mass 
parameters (all specified at the GUT scale) plus the ratio of Higgs vacuum expectation values tan (3 and 
sign(/i), where fi is the Higgs/higgsino mass parameter whose square is computed from the conditions 
of radiative electroweak symmetry breaking (EWSB). 

2 Coverage study of the CMSSM 

2.1 Accelerated inference from neural networks 

Coverage studies require extensive computational expenditure, which would be unfeasible with standard 
analysis techniques. Therefore, in Ref. lfT2l a class of machine learning devices called Artificial Neural 
Networks (ANNs) was used to approximate the most computationally intensive sections of the analysis 
pipeline. 

Inference on the parameters of interest requires relating them to observable quantities, such as 
the sparticle mass spectrum at the LHC, denoted by m, over which the likelihood is defined. This is 
achieved by evolving numerically the Renormalization Group Equations (RGEs) using publicly avail- 
able codes, which is however a computationally intensive procedure. One can view the RGEs simply 
as a mapping from — > m, and attempt to engineer a computationally efficient representation of the 
function. In Ifl2ll . an adequate solution was provided by a three-layer perceptron, a type of feed-forward 
neural network consisting of an input layer (identified with 0), a hidden layer and an output layer (iden- 
tified with the value of m(0) that we are trying to approximate). The weight and biases defining the 
network were determined via an appropriate training procedure, involving the minimization of a loss 
function (here, the discrepancy between the value of m(0) predicted by the network and its correct 
value obtained by solving the RGEs) defined over a set of 4000 training samples. A number of tests on 
the accuracy and noise of the network were performed, showing a correlation in excess of 0.9999 be- 
tween the approximated value of m(0) and the value obtained by solving the RGEs for an independent 
sample. A second classification network was employed to distinguish between physical and un-physical 
points in parameter space (i.e., values of that do not lead to physically viable solutions to the RGEs). 
The final result of replacing the computationally expensive RGEs with the ANNs is presented in Fig. \T\ 
which shows that the agreement between the two methods is excellent, within numerical noise. By using 
the neural network, a speed-up factor of about 3 x 10 4 compared with scans using the explicit spectrum 
calculator was observed. 

2.2 Coverage results for the ATLAS benchmark 

We studied the coverage properties of intervals obtained for the so-called "SU3" benchmark point. To 
this end, we need the ability to generate pseudo-experiments with fixed at the value of the benchmark. 
We adopted a parabolic approximation of the log-likelihood function (as reported in Ref. [ 13 ]), based on 
the measurement of edges and thresholds in the invariant mass distributions for various combinations of 
leptons and jets in final state of the selected candidate SUSY events, assuming an integrated luminosity 
of 1 fb _1 for ATLAS. Note that the relationship between the sparticle masses and the directly observable 
mass edges is highly non-linear, so a Gaussian is likely to be a poor approximation to the actual likelihood 
function. Furthermore, these edges share several sources of systematic uncertainties, such as jet and 
lepton energy scale uncertainties, which are only approximately communicated in Ref. lPT3l . Finally, we 
introduce the additional simplification that the likelihood is also a multivariate Gaussian with the same 
covariance structure. We constructed 10 4 pseudo-experiments and analyzed them with both MCMC 
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Fig. 1: Comparison of Bayesian posteriors obtained by solving the RGEs fully numerically (black lines, giving 
68% and 95% regions) and neural networks (blue lines and corresponding filled regions), from simulated ATLAS 
data. The red diamond gives the true value for the benchmark point adopted. From Ifl2l . 



(using a Metropolis-Hastings algorithm) and MultiNest. Altogether, our neural network MCMC runs 
have performed a total of 4 x 10 10 likelihood evaluations, in a total computational effort of approximately 
2 x 10 4 CPU-minutes. We estimate that the solving the RGEs fully numerically would have taken 
about 1100-CPU years, which is at the boundary of what is feasible today, even with a massive parallel 
computing effort. 

The results are shown in Fig. |2] where it can be seen that the methods have substantial over- 
coverage for 1-d intervals, which means that the resulting inferences are conservative. While it is difficult 
to unambiguously attribute the over-coverage to a specific cause, the most likely cause is the effect of 
boundary conditions imposed by the CMSSM. When is composed of parameters of interest, 9, and 
nuisance parameters, ip, the profile likelihood ratio is defined as 

£(«■</>) 

where rp is the conditional maximum likelihood estimate (MLE) of tp with 6 fixed and 0, ip are the 
unconditional MLEs. When the fit is performed directly in the space of the weak-scale masses (i.e., 
without invoking a specific SUSY model and hence bypassing the mapping — > m), there are no 
boundary effects, and the distribution of —2 In A(m) (when m is true) is distributed as a chi-square with 
a number of degrees of freedom given by the dimensionality of m. Since the likelihood is invariant under 
reparametrizations, we expect — 2 In X(6) to also be distributed as a chi-square. If the boundary is such 

that m(0, 7^ rh or m(6, $) ^ m, then the resulting interval will modified. More importantly, one 
expects the denominator C(9,ip) < C(m) since m is unconstrained, which will lead to — 21nA(6>) < 
—2 In A(m). In turn, this means more parameter points being included in any given contour, which leads 
to over-coverage. 

The impact of the boundary on the distribution of the profile likelihood ratio is not insurmount- 
able. It is not fundamentally different than several common examples in high-energy physics where an 
unconstrained MLE would lie outside of the physical parameter space. Examples include downward 
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Fig. 2: Coverage for various types of intervals for the CMSSM parameters, from 10 4 realizations, employ- 
ing MCMC for the reconstruction (each pseudo-experiment is reconstructed with 10 6 samples). Green/circles 
(red/squares) is for the 68% (95%) error. From [TFJI . 



fluctuations in event-counting experiments when the signal rate is bounded to be non-negative. Another 
common example is the measurement of sines and cosines of mixing angles that are physically bounded 
between [—1,1], though an unphysical MLE may lie outside this region. The size of this effect is related 
to the probability that the MLE is pushed to a physical boundary. If this probability can be estimated, it is 
possible to estimate a corrected threshold on —2 In A. For a precise threshold with guaranteed coverage, 
one must resort to a fully frequentist Neyman Construction. A similar coverage study (but without the 
computational advantage provided by ANNs) has been carried out for a few CMSSM benchmark points 
for simulated data from future direct detection experiments fl4l . Their findings indicate substantial 
under-coverage for the resulting intervals, especially for certain choices of Bayesian priors. Both works 
clearly show the timeliness and importance of evaluating the coverage properties of the reconstructed 
intervals for future data sets. 

3 Challenges of profile likelihood evaluation 

For highly non-Gaussian problems like supersymmetric parameter determination, inference can depend 
strongly on whether one chooses to work with the posterior distribution (Bayesian) or profile likelihood 
(frequentist) HEBEl- There is a growing consensus that both the posterior and the profile likelihood 
ought to be explored in order to obtain a fuller picture of the statistical constraints from present-day and 
future data. This begs the question of the algorithmic solutions available to reliably explore both the 
posterior and the profile likelihood in the context of SUSY phenomenology. 

The profile likelihood ratio defined in Eq. (Q~|) is an attractive choice as a test statistics, for un- 
der certain regularity conditions, Wilks [16 1 showed that the distribution of — 21nA(#) converges to a 
chi-square distribution with a number of degrees of freedom given by the dimensionality of 6. Clearly, 
for any given value of 9, evaluation of the profile likelihood requires solving a maximisation prob- 
lem in many dimensions to determine the conditional MLE tp. While posterior samples obtained with 
MultiNest have been used to estimate the profile likelihood, the accuracy of such an estimate has 
been questioned liTTl . As mentioned above, evaluating profile likelihoods is much more challenging than 
evaluating posterior distributions. Therefore, one should not expect that a vanilla setup for MultiNest 
(which is adequate for an accurate exploration of the posterior distribution) will automatically be op- 
timal for profile likelihoods evaluation. In Ref. flTTl the question of the accuracy of profile likelihood 
evaluation from MultiNest was investigated in detail. We report below the main results. 

The two most important parameters that control the parameter space exploration in MultiNest 
are the number of live points nij ve - which determines the resolution at which the parameter space is 
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Fig. 3: 1-D profile likelihoods from present-day data for the CMSSM parameters normalized to the global best- 
fit point. The red solid and blue dotted vertical lines represent the global best-fit point {\ 2 = 9.26, located in 
the focus point region) and the best-fit point found in the stau co-annihilation region (\ 2 = 11.38) respectively. 
The upper and lower panel show the profile likelihood and A\ 2 values, respectively. Green (magenta) horizontal 
lines represent the la (2a) approximate confidence intervals. MultiNest was run with 20,000 live points and 
tol = 1 x 10~ 4 (a configuration deemed appropriate for profile likelihood estimation), requiring approximately 1 1 
million likelihood evaluations. From ifTTl . 



explored - and a tolerance parameter tol, which defines the termination criterion based on the accuracy 
of the evidence. Generally, a larger number of live points is necessary to explore profile likelihoods 
accurately. Moreover, setting tol to a smaller value results in MultiNest gathering a larger number of 
samples in the high likelihood regions (as termination is delayed). This is usually not necessary for the 
posterior distributions, as the prior volume occupied by high likelihood regions is usually very small and 
therefore these regions have relatively small probability mass. For profile likelihoods, however, getting 
as close as possible to the true global maximum is crucial and therefore one should set tol to a relatively 
smaller value. In Ref. IfTTl it was found that nii ve = 20, 000 and tol = 1 x 10~ 4 produce a sufficiently 
accurate exploration of the profile likelihood in toy models that reproduce the most important features of 
the CMSSM parameter space. 

In principle, the profile likelihood does not depend on the choice of priors. However, in order 
to explore the parameter space using any Monte Carlo technique, a set of priors needs to be defined. 
Different choices of priors will generally lead to different regions of the parameter space to be explored 
in greater or lesser detail, according to their posterior density. As a consequence, the resulting profile 
likelihoods might be slightly different, purely on numerical grounds. We can obtain more robust profile 
likelihoods by simply merging samples obtained from scans with different choices of Bayesian priors. 
This does not come at a greater computational cost, given that a responsible Bayesian analysis would es- 
timate sensitivity to the choice of prior as well. The results of such a scan are shown in Fig. [3l which was 
obtained by tuning MultiNest with the above configuration, appropriate for an accurate profile likeli- 
hood exploration, and by merging the posterior samples from two different choices of priors (see lilTl for 
details). This high-resolution profile likelihood scan using MultiNest compares favourably with the 
results obtained by adopting a dedicated Genetic Algorithm technique IfTTl . although at a slightly higher 
computational cost (a factor of ~ 4). In general, an accurate profile likelihood evaluation was about an 
order of magnitude more computationally expensive than mapping out the Bayesian posterior. 
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4 Conclusions 

As the LHC impinges on the most anticipated regions of SUSY parameter space, the need for statistical 
techniques that will be able to cope with the complexity of SUSY phenomenology is greater than ever. 
An intense effort is underway to test the accuracy of parameter inference methods, both in the Frequentist 
and the Bayesian framework. Coverage studies such as the one presented here require highly-accelerated 
inference techniques, and neural networks have been demonstrated to provide a speed-up factor of up to 
30, 000 with respect to conventional methods. A crucial improvement required for future coverage in- 
vestigations is the ability to generate pseudo-experiments from an accurate description of the likelihood. 
Both the representation of the likelihood function and the ability to generate pseudo-experiments are now 
possible with the workspace technology in RooFit/RooStats lfl"8l . We encourage future experiments to 
publish their likelihoods using this technology. Finally, an accurate evaluation of the profile likelihood 
remains a numerically challenging task, much more so than the mapping out of the Bayesian posterior. 
Particular care needs to be taken in tuning appropriately Bayesian algorithms targeted to the exploration 
of posterior mass (rather than likelihood maximisation). We have demonstrated that the MultiNest 
algorithm can be succesfully employed for approximating the profile likelihood functions, even though it 
was primarily designed for Bayesian analyses. In particular, it is important to use a termination criterion 
that allows MultiNest to explore high-likelihood regions to sufficient resolution. 
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